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Abstract: Positivity of the prior probability of KuUback-Leibler neigh- 
borhood around the true density, commonly known as the KuUback-Leibler 
property, plays a fundamental role in posterior consistency. A popular prior 
for Bayesian estimation is given by a Dirichlct mixture, where the kernels 
are chosen depending on the sample space and the class of densities to 
be estimated. The KuUback-Leibler property of the Dirichlct mixture prior 
has been shown for some special kernels like the normal density or Bern- 
stein polynomial, under appropriate conditions. In this paper, we obtain 
easily verifiable sufficient conditions, under which a prior obtained by mix- 
ing a general kernel possesses the KuUback-Leibler property. We study a 
wide variety of kernel used in practice, including the normal, t, histogram, 
gamma, WeibuU densities and so on, and show that the KuUback-Leibler 
property holds if some easily verifiable conditions are satisfied at the true 
density. This gives a catalog of conditions required for the KuUback-Leibler 
property, which can be readily used in applications. 
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1. Introduction 

Density estimation, which is also relevant in various applications such as clus- 
ter analysis and robust estimation, is a fundamental nonparametric inference 
problem. In Bayesian approach to density estimation, a prior such as a Gaus- 
sian process, a Polya tree process, or a Dirichlct mixture is constructed on the 
space of probability densities. Dirichlct mixtures were introduced by Ferguson 
[9] and Lo [21] who also obtained expressions for resulting posterior and pre- 
dictive distribution. West [.30], West, Miiller and Escobar [M] and Escobar and 
West [(i; 7] developed powerful Markov chain Monte Carlo methods to calculate 
Bayes estimates and other posterior quantities for Dirichlct mixtures. 
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The priors of interest in this paper are of mixture type and ean be described 
in terms of a kernel and a prior for the mixing distribution. Let X be the sample 
space and is the space of the mixing parameter 9. Let K{x; 9) be the kernel on 
X X O, i.e., K{x\ 9) is a jointly measurable function such that for all 9, K{-\ 9) is 
a probability density on X. The choice of an appropriate kernel depends on the 
underlying sample space X, on which the true density is defined. If X is the entire 
real line, a location-scale kernel is appropriate. If X is the unit interval, a uniform 
or triangular density kernel, or Bernstein polynomial may be considered. If X is 
the positive half line (0, oo), mixtures of gamma, WeibuU, lognormal, exponential 
or inverse gamma may be used. Petrone and Veronese [I't] discussed the issue 
of the choice of a kernel in view of a constructive approximation known as the 
Feller sampling scheme. Let P, the mixing distribution on 0, be given a prior 11 
on ^(6), the space of probability measure on 9. Let supp(n) denote the weak 
support of n. The prior on P and the chosen kernel then give rise to a prior on 
^^(X), the space of densities on X, via the map P ^ fp{x) := / K{x\9)dP{9). 
We shall call such a prior a type I mixture prior or Prior 1 in short. To enrich 
the family of the kernels, let the kernel function contain another parameter 
referred to as the hyper parameter. In this case, we shall denote the kernel by 
K{x] 9, (j)). The hyper parameter might be elicited a priori or be given a prior. 
In the former case, such a prior essentially reduces to Prior 1. For the latter 
case, assume that </> is independent of P and denote the prior for (j) by ^. Let 
$ be the space of </> and supp(^) denote the support of fi. With such a random 
hyper parameter in the chosen kernel, the prior on densities is induced by /i x 11 
via the map {4>,P) ^ fp,4,{x) := / K[x;9,(j))dP{9). We shall call this prior a 
Type II mixture prior or simply Prior 2. Clearly, Prior 2 contains Prior 1 as a 
special case where is treated as a vacuous parameter. In some situations, the 
prior n may contain an additional indexing parameter ^. For instance, when 11 
is the Dirichlet process with base measure (written as DP(a^)) depending 
on an indexing parameter ^, which is also given a prior, we obtain a mixture of 
Dirichlet processes (MDP) [1] prior for mixing distribution P. Addition of this 
hierarchical structure to Prior 1 or Prior 2 gives somewhat more flexibility. In 
this paper, we do not make any specific assumption on 11 like DP or MDP other 
than requiring that it has large weak support. The prior induced on the space 
of densities by a mixing distribution P ~ 11 (and cj) ^ fj. and ^ tt) will be 
denoted by 11* and we shall refer to it as a kernel mixture prior. Note that the 
variable x and the parameters 9, (j) and ^ mentioned above are not necessarily 
one-dimensional. 

Asymptotic properties, such as consistency, and rate of convergence of the 
posterior distribution based on kernel mixture priors were established by Ghosal, 
Ghosh and Ramamoorthi [11], Tokdar [29], and Ghosal and van der Vaart [13; 
14], when the kernel is chosen to be a normal probability density (and the 
prior distribution of the mixing distribution is DP). Similar results for Dirichlet 
mixture of Bernstein polynomials were shown by Petrone and Wasserman [26] , 
Ghosal [10] and Kruijer and van der Vaart [19]. However, in the literature, there 
is a lack of such results for mixture of other kernels, which are also widely used 
in practice. We are only aware of the article by Petrone and Veronese [25] who 
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considered general kernels. However, they derived consistency only under the 
strong and unrealistic condition that the true density is exactly of the mixture 
type for some compactly supported mixing distribution, or the true density 
itself is compactly supported and is approximated in terms of KuUback-Leibler 
divergence by its convolution with the chosen kernel. 

Schwartz [28] showed that the consistency at a true density /o holds if the 
prior assigns positive probabilities to specific type of neighborhoods of /o defined 
by Kullback-Leibler divergence measure and the size of the model is restricted 
in some appropriate sense. Thus the prior positivity condition, known as the 
Kullback-Leibler property (KL property), is fundamental in posterior consis- 
tency studies. More formally, let a density function / be given a prior 11*. Define 
a Kullabck-Leibler neighborhood of / of size e by = {g : JC{f;g) < e}, 

where /C(/; g) = J f log(//g), the Kullback-Leibler divergence between / and g. 
We say that the KL property holds at fo £ ^(X) or /o is in the Kullback-Leibler 
support (KL support) of H*, and write fo e KL(n*), if n*(^(/o)) > for ev- 
ery e > 0. For the weak topology, the size condition in Schwartz's theorem holds 
automatically [16, Theorem 4.4.2]. Further, Ghosal, Ghosh and Rammamoorthi 
[12] argued that this property drives consistency of the parametric part in some 
semiparametric models. 

This paper addresses issues about KL property of general kernel mixture 
priors, thus addressing one of the most important issues in posterior consistency. 
We discuss the KL property for general kernel mixture priors, which are not 
restricted by any particular type of kernel or by a prior distribution for mixing 
distribution. The distinguished feature of our results is that we allow the true 
density to be not of the chosen mixture type, and impose only simple moment 
conditions and qualitative conditions like continuity or positivity. 

Ghosal, Ghosh and Rammamoorthi [11] presented results on consistency for 
Dirichlet location mixture of a normal kernel with an additional scale parameter 
in terms of both weak and Li-topologies. Tokdar [29, Theorem 3.2] considered a 
location-scale mixture of the normal kernel and established consistency in weak 
topology (weak consistency) under more relaxed conditions. If the prior H is 
chosen to be DP (a), Tokdar [29] also weakened a moment condition on the true 
density in his Theorem 3.3. His Theorem 3.2 will be implied by Theorem 4 in 
this paper (with the choice A = there). In fact, we establish the KL property 
for a general location-scale kernel mixture and show that such a result applies 
to various kernels including the skew-normal, double-exponential and logistic. 
This is a substantial generalization of results known for only the normal kernel 
thus far. Moreover, we obtain results about the KL property for priors with ker- 
nels not belonging to location-scale families, e.g., the Weibull, gamma, uniform, 
and exponential kernels. The examples studied here provide a ready catalog of 
conditions required for the KL property to hold for virtually all kernel mixture 
priors that are of practical interest. 

With the the help of our results on KL property, consistency in Li- (cquiv- 
alently, Hellinger) distance can be obtained by constructing appropriate sieves 
approximating the class of mixtures and establishing entropy bounds for them. 
Since the techniques used for sieve construction and bounding entropy vary 
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widely depending on the chosen kernel, wc do not address Xi-eonsistency in 
this paper. 

The paper is organized as follows. In Section 2, we study the kernel mixture 
priors under complete generality without specifying a kernel or the nature of it. 
In Section 3, using the results provided in Section 2, we study the priors with 
kernels of the location-scale type. In Section 4, the priors with concretely speci- 
fied kernels are studied as examples by using the results in the previous sections. 

2. General Kernel Mixture Priors 

First we observe that the KuUback Leibler property is preserved under taking 
mixtures. 

Lemma 1. Let /|^ n|, where ^ is an indexing parameter following a prior tt 
and let fo be the true density. Suppose that there exists a set B with properties 
n{B) > and B (Z : fo e KL(n|)}. Then fa € KL(n*), where H* = 

/njrf7r(e). 

The proof is almost a trivial application of Fubini's theorem, since 



JB 

In view of this result, henceforth we shall discard the indexing parameter ^ from 
our prior. 

Theorem 1. Let fo be the true density, /i and 11 be priors for the hyper pa- 
rameter and the mixing distribution in Prior 2, and 11* be the prior induced by 
fj, and n on ^(X). If for any e > 0, there exists P,:, (j)^, A G ^ with fi{A) > 
and W C ^(9) with n{W) > 0, such that 



n*{/ : / G ^3.(/o)} > n*{/p,^ : p e #-,0 e ^} = (n X x ^) > o. 






(1) 



Hence, 



□ 
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Remark 1. If 11 = DP(q;) and supp(P(;) C supp(Q;); then e supp(n); sec, for 
instance, Theorem 3.2.4 of [16]. In particular, the condition holds for any chosen 
Pe if a is fully supported on 0. A similar assertion holds when 11 is the Polya tree 
prior PT{{%n}, s^) (see [20]). Let %n be a collection of gradually refining binary 
partitions and £/ = {aci,...,e„ : ei, . . . , Cm = or 1, m > 1}. If the end points of 
Tm form a dense subset of some set ^ where ^ D supp(P£) and the elements 
of £/, which control the beta distributions regulating the mass allocation to the 
sets in Hm, are positive, then also P^ € supp(n). This is implicit in Theorem 5 
of [20] or Theorem 3.3.6 of [20]; for an explicit statement and proof, see Theorem 
2.20 of [15]. Now, if W is an open neighborhood of P^, then U{W) > holds. 



Remark 2. Assume that <f>e G supp(yu). Condition A2 clearly holds with A an 
open neighborhood of 0,:, assuming that (/) J fo^og{fp,_,4>c/ fPc,4>) is continu- 



In most application, we can choose P^ to be compactly supported. Compact- 
ness of supp(Pe) often helps satisfy condition A4-A9 in Lemmas 2 and 3, which 
are useful in verifying the conditions of Theorem 1. 

Lemma 2. Let fo, II, /i and 11* be the same as in Theorem 1. IJ for any e > 0, 
there exist P^, a set D containing supp(Pe), and <j)^ € supp(/i) such that Al 
holds and the kernel function K satisfies 

A4-. for any given x and 6, the map (f> t-^ K{x; 9, 0) is continuous on the interior 
of the support of /i; 



A5. 



log 



mig^o K{x;9 



+ 



log 



|/o(x)(ia; < oo for every 



4> G N{(j)f), where N{(j}f) is an open neighborhood of (j)^ 
A6. for any given x ^ X, €z D and (j) G N{(j)^), there exists g{x, 9) such that 
g{x,9) > K{x;9,(j)), and J g{x,9)dP,{9) < oo; 

then there exists a set A C $ such that A 2 holds. 

Proof. By Condition A4, we have that K{x; 9, 0) — *■ K{x; 9, (j)^) as (j) ^ for 
any given x and 9. By Condition A6 and the dominated convergence theorem 
(DCT), fp^^4,{x) /p,.0j (a;, ) as 0^, for any given x. Equivalently, this can 
be written as 

fp.,4 



log 



ff 



Note that 



log- 



TPe,4 



pointwise, as 



log- 



(2) 



< < 



log 



mf0(=D K{x; 9, </>) 
smpe^jjK{x;9,(l)) 



if 



fp,4> 



> 1, 



Me(=DK{x]9,(l}^) 



if 



< 1. 



By Condition A5 and the DCT, / falog^f^ ^ as </) 
given e > 0, there exists 5 > Q such that / /o log < e if 



Hence, for 
(hA < S. If 



Y. Wu and S. Ghosal/KuUback Leibler property of kernel mixture priors 303 

A = {(/) : - < 5} n N{(t>,), then / /o log < e for all (f) e A. The proof 

is completed by noticing that fi{A) > 0, since A is an open neighborhood of 

€ SUpp(/x). □ 

Lemma 3. Let fo, 11, fi and II* be the same as in Theorem 1. If for any 
e > 0, there exist S supp(n), (f)^ S supp(/i), and A C <& with ^{A) > such 
that Conditions Al and A2 hold and for some closed D D supp(P£), the kernel 
function K and prior 11 satisfy 

AT for any ^ e A, J log j-j-i^^g^/o(x)dx < oo; 

A8. c := infajgc infegD K{x; 0, 4>) > 0, for any compact C C X; 

A9. for any given (j) € A and compact C d X, there exists E containing D in its 
interior such that the family of maps {9 i-^ K{x; 0, (p), x S C} is uniformly 
equicontinuous on E <Z Q, and sup{A'(a;; 0^(j)) : x €z C,0 € E"^} < ce/4; 

then there exists W C ^{Q) such that Condition A3 holds and Ii{W) > 0. 

Proof. For any (/> £ A, write 

X, fp,.<ji{x) f fp,.^[x) 

mx)\og ^ dx = / /o x log . dx 

fpA^) Jc JpA^) 

fo{x)\ogi^^dx. (3) 
c JpA^) 

Now, since P,{D) = 1 > i, r = {P : P{D) > i} is an open neighborhood of 
Pe by the Portmanteau Theorem. For any P and G A, 

■ /o a; log f , . dx 
< [ /o(a;)log-^^ ■ f 



< 



■^°(-)^"g infJr4^!M) ^^+^^°^^^^^"^^^^^ 



here Pj^ is the probability measure corresponding to /g. By Condition A7, there 
exists compact C C X, such that 

k{x) log -T^^^i#Wrf^ < e/4. (4) 

We can further ensure that Pfg{C'^) < e/4, so the bound for J^^ fo log ^^•'^ is 
less than e/2. Now, if we can show that for the given e > 0, there exists a weak 
neighborhood of Pe, such that /o(a;) log ^^f^dx < e/2 for any P G 
and (j) G A, then Lemma 3 is proved by letting 1^ = ^ C] y. 

Observing that for any given (p E A, the family of maps {9 i-^ K(x; 0, (p) : 
a; g C} is uniformly equicontinuous on P C 6, by the Arzela-Ascoli theorem. 
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(see [27, pp. 169]) for any S > 0, there exist xi,X2, ■ ■ ■ ,Xm, such that, for any 
a; e C, 

sxi^\K{x;9,(t))-K{x^;e,(t))\<c5. (5) 



for some i = 1, 2, . . . , to. 

Let ^ = {P : \^j^K{xi]e,(t))dP^{e) - ^j^K[x^;e,(i))dP{e)\ < cS, i = 
1,2, ... , to}. Then is an open weak neighborhoods of since Pe G supp(n) 
and P^{dE) = 0. For any x € C, choosing Xi to satisfy (5), we have that 



K{x; 0, (t))dP{e) - I K(x; 9, (t>)dP, {6) 
Je 

< sup{/-i:(a;; e,4>) : 9 € E" ,x ^ C} 

K{x;e,(f>)dP{9) - / K{x^;e,(j))dP{9) 



K{x,;9,(j})dP{9)- / K{x,;9,(j})dP,{e) 

JE 

K{x,;9,(l,)dP,{9)~ [ Kix;9,^)dP,{9) 



E 

< — + 2cS 
4 



Kix,;9,^)dP,i9)~ / K{x,;9,<i>)dP{e) 



< c 



3(5 



(6) 



if P G Also Jq K{x', 9, 4>)dP^{9) > c for any x E C, since P^ has support in 
D. Hence, given (j) £ A, for any P G and x e C, 



j^K{x;9,cj,)dP{9) 



J^K{x;9,cj})dP,{9) 
Then, for 3(5 + e/4 < 1, 

/ei^(x;g,0)dPe(g) 



J^K{x;9,c^)dP{9) 



- 1 



< 



<3(5+ -. 
4 



3(5 + e/4 
l-3(5-e/4' 



By choosing S smah enough, we can ensure that the right hand side (RHS) of 
the last display is less than e/2. Hence, for any given (p G A 



c 



fo[x) log „ , ^ dx < sup 
JP,<I>(X) xec 



j^K{x;9,(p)dP,{9) ^ 
J^K{x;9,cf>)dP{9) 



< e/2 



for any P e 



□ 



3. Location scale kernel 

In this section we discuss priors with kernel functions belonging to location scale 
families. We write the kernels as K{x; 9, h) = t^xI^^); where x(-) is a proba- 
bility density function defined on W^, x = {xi, . . . , Xd), and 9 = {9i, . . . , 9d) are 
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c?-dimcnsional vectors and h G (0, oo). Let ||a:;j| denote y'xj+x^+TTT+x^, and 
Xii^) denote Obviously, when d ^ 1, this reduces to ordinary derivative 

and II • II denotes absolute value. We have the following theorems, whose proofs 
use some ideas from the proof of Theorem 3.2 of ['2')]. 

Theorem 2. Let fo{x) be the true density and 11* be a type I prior on 

with kernel function h^'^xi^jr)' P ^ o,^d given P, {9,h) ^ P. If xi ) 

and foix) satisfy: 

Bl. xi.') is bounded, continuous and positive everywhere; 

B2. there exists > such that x{x) decreases as x moves away from outside 

the ball {x : ||a:|| < li}; 
B3. there exists I2 > such that Yl'i=i ^x(z) ^"'^ H-^H — '2 md i = 

B4-. for some < M < 00, < fo{x) < M for all x; 
B5. \Jfo{x)\ogfo{x)dx\<oo; 

B6. for some S > 0, J /o(x)log|^dx < 00, where ^^(x) = infnt-x\\<s fo{t); 

B7. there exists rj > 0, such that \ / /o(a;) logx(2a;||a;||'')(ia;| < 00 
and J fo{x)\ \ogx{'^)\dx < 00 for any a € M'^, b € (0,oo); 
B8. the weak support o/H is Ji{W^ x R+); 
B9. when d>2, xiv) = o(||?/||-'') as \\y\\ 00. 

Then fo € KL(n*). 

Remark 3. Tokdar [29] assumed that the weak support of 11 includes all com- 
pactly supported probabilities in K'^ x K+ . Then automatically the weak support 
of n is ^{M.'^ X This is because any arbitrary probability measure can be 
weakly approximated by a sequence of compactly supported probability mea- 
sures. 

Proof of Theorem 2. We prove this theorem by verifying the conditions in 
Theorem 1. Since there is no hyper-parameter in Prior 1, we only need to show 
that Conditions Al and A3 are met. 

To show that Condition Al is met, we define, 

tmfo{x), \\x\\ <m, 

TO > 1, 

0, otherwise, 

where t^^ = /o(a;)(ix, hm — m^^', Fm is the probability measure cor- 

responding to frrn Pm — Fm X 6 {km), whcrc S{-) IS the degenerate distribu- 
tion. Obviously, P,„ is compactly supported. Then, using the transformation 
a = (x- - 0)/hm, 

fp^i-) = [j^X (^) dF„.{9) ^ / (^) M0)d9 

J \ rim J J\\e\\<m "'m \ / 

= / x(a)/o(a; - ah,n)da. 

J Wx—ah^T^W Km 
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Since for any given a, xiO')foi^ ~ cthm) — > x(a)/o(a;) as hm — > and /o is 
bounded, by the DCT, we obtain fp^{x) — > fo[x). 
Now, to satisfy Condition Al, we show that 

jo(x) log -J — ^ as m ^ oo. 

To this end, observe that 

-'||e||<m "-m V ^rn / 
J\\e\\<m \ IT-m J 



Hence, as log < 0, 



< Mt„,<Mti. 



log^>log|M. (7) 
Jp^ix) Mti 



Also 



Jo (a;) log- — ^dx 



/o(a;)log-- — --^dx+ / /o(.T)log \\ da;. 



Let m < li. Now, for jlxj] > to, using assumption B2, 

J\\0\\<m "m \ / 



I k{e)de 

\\e\\ <m 







-f 




















X 

a;+ — 



> ii.Trx(2iixirx) (8) 

The last inequality holds when T ^ T'>x{T'^{x+T-^)) is decreasing for T > Tq. 
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This follows because, with z = T'x + T''+^x/\\ x\\, a positive multiple of x, 



d 



rylogT + logx T'^x + T'i 



7+1 



1 
T 



^ x(^) V ^ll^l 



11^ 



< 



by Condition B3. 

For ||a;|| < m, let 5 > be fixed, and </>*„(x) = inf||f-2;||<<5/i„ /o(i)j 



hi 



1 

X 



foiO)d0 



[0|l<m "m \ 

1 /x - 
X — 

{lle|l<m}n{lt0-xlt<<5/i,„} V 

1 

X 



{|[e|t<m}n{||0-x||<i5/i„} hf„ \ rim 

x{u)du 

{Ija; — uh,„||<m}n{||u||<(5} 

X{u)du, 



foio)de 
de 



with the convention that [a, 6] = [6, a] if < a. The last inequality holds because 
when < m, 

< u : u e ]^[0, sign(a;i)5/\/d] > C {m : ||a;/ft,„j — uj| < m/hm and < S} . 

We have > 1, — </'i(2;)- Let 

c = min / . 

Then, Jp^{x) > C(/)i(x), for all ||a;|| < m. For < i? < m, 

C(^i(a;j, < R, 



min ||a;|r'x 2||x| 



i+>). 



, C(j)i{x) 



\\x\\ > R. 



log 



c(j)i{x) 
max log 



/o(x) 



,log 



foix) 



\\xPxi2\\x\\^+v^y -cMx 



\x\\ <R, 

\x\\ > R. 
(9) 
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Combining (7) and (9), we obtain 



log 



< max ^{x 



log 



foix) 



Mti 



From Condition B5, 



log 



Joix) 



MU 



fo{x)dx ^ log Mil 



foix) logfo{x)dx < oo. 



Now 



£.{x)fo{x)dx = 



Ja[x) \og———dx 

x\\<B. C(Pl{x) 



foix) max log 



xl\>R 



foix) 



,log 



foix) 



|jx||''x(2||x||')a;)' " cMx) 



Hence, 



^ix)foix)dx< I /o(x)log ^°^^\ dx 



foix) 



/o(x)log- 

x\\>R,fo{x)>\\x\\"x{2\\x\\ix) \\X\\ 'X\^\\X\\ 'X) 



dx, 



since max(a;i, ^2) < xi + X2 if xi > 0. The first term on the RHS of the above 
inequality is finite, by Condition B6. By Conditions B5 and B7, the second term 
is also finite. Thus / foix) log dx — > as m ^ 00, i.e.. Condition Al is 

satisfied. 

We show that Condition A3 is met by verifying the conditions of Lemma 3. 
First, from the proof above, we see that for any e > 0, there exists rrie such that 
/ foix) log j^^^j^dx < e. Let in Theorem 1 be chosen to be Pm^i which is 
compactly supported. By Condition B8, € supp(n). Second, Condition A7 is 
satisfied. To show log — ■f''^^^') ^ is /o-integrablc, it suffices to show that 

mf(e.h)eD -j^xi—h-) 

logfp^ix) and loginf(e^/j)g£) 7^x(^^) both /o-intcgrable. Without loss of 
generality, let D = {\\9\\ < a*} x [h, h], where a* > rrie and < h< mj^ < h < 
\. For j|a;|| < a*, loginf(e.,,)g£, ^x(nr) bounded. For ||a-|| > a*. 



log inf — tV 



log' 



X + a 



h 



(10) 



By Condition B7 and expression (10), log inf (g. prx(^7p) is /o-integrablc. 
Consider fp^x) = ^xi^)dPe- Let D = {\\9\\ < a*} x [h,h], then 



log 



dP, 



< 



log 



X + a* 



PciD) 



for > a*. Hence, log/p^(x) is also /o-integrable by the similar argument. 
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Condition A8 is satisfied by Condition Bl. 

We show that Condition A9 is also satisfied. Let C C X be a given compact 
set. First we show that {-^xi^^) '■ ^ € C} is uniformly equicontinuous as a 
family of functions of {6, h) on E = [—a, a]'' x [^h, 2h] where a > a* . 

Such an E contains D in its interior, and is compact. By the definition of 
uniform equicontinuity, it is to show that for any e > 0, there exists 5 > such 
that for all x G C and aU (9, h), (6*', h') e E with || (6*, h) - (6*', h')\\ < S, we have 
Ih-'^xi^) - h'-'^x{^)\ < £• Observe that 

-X ('-^) - -X (^-^) 

< |X(^)-X(^)| \h'd-hd\ (x~l_ 
- hd h<^h"^ ^\ h' 

Since E and C are compact and h is bounded away from within E^ {^jr ■ ^ ^ 
C, (6*, h) £ E} is also a compact set. Hence ci = sup^^,^ j-g h)eE xi ^jj ) is finite, 
by the continuity of x(-)- Let S* = -^Mfr^e, then for \h' — h\ < i we have 

l^/d _ ^ g^j^j hence the last term in (11) is less than e/2. Since : a; € 

C, (0, h) €z E is compact, x(') is uniformly continuous on it. For any given e > 0, 
there exists 5** > such that whenever x G C and {6, h), {9' , h') E E, with 
11^ - < S**, we have |x(^) - x(^)l < eh/2'^+\ which ensures the 

second term on the RHS of (11) less than e/2. Notice that ||^ - ^^\\ < S** 
is equivalent to 

\\{h-h')9+{9' -9)h+ {h' - h)x\\ < hh'5** . (12) 

When \\9-9'\\ < ^ and \h~h'\ < min{^, \h-h'\ < ,£2c\m }, rela- 
tion (12) holds. Hence if e > and <5 = min{^^^, gg^, 
then for all x G C and all (6*, /i), (61', h') € E with || {9, h) - (6*', /i')|| < (5, we have 
l^~''x(^r^) ~ "^xl^TTT-)! < £• Thus the uniform equicontinuity required in 
Condition A9 is satisfied. 

We can enlarge E to ensure that h~'^x{^ir') is i^ss than any preassigned 
number for x G C and {9, h) G E'^. This holds for large value of h, since x(-) is 
bounded. For small values of /i, notice that h~'^x{^jr) — h~'^o{ \\Jlg\\d ) = o(||a; — 
^ll""^). This follows from Assumption B9 when d>2. For d = 1, the condition 
automatically holds since / x{y)dy = 1 implies xiu) = o(||y||^^) with the help 
of the montonicity condition B2. For given C, choosing a and h large enough to 
construct the set E, we have sup{/i~'^x(^^) x € C, {9, h) G E^] < ce/4, for 
any given e. □ 




Now we consider Prior 2 with location scale family kernels. Let the location- 
parameter for the density be mixed according to P following a prior H. Let the 
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scale-parameter hhe a hyper-parameter, which is also given a prior distribution 
/X. Assmne that h and P are a priori independently distributed. We let H* to 
denote the prior for the density functions on X, induced by 11 x /i via the mapping 
(P, h) h-> fp^h = / h-'^x{^)dP{G)- We then have the following theorem. 

Theorem 3. For such prior described above, let xi^) o.'^d fo{x) be densities on 
X satisfying condition B1-B9. Then, fo G KL(n*). 

Proof. The proof uses Theorem 1 and Lemmas 2 and 3. Verification the Con- 
ditions A7-A9 is similar to (but easier than) that in Theorem 2. The second 
inequality in Condition B7 implies that Condition A5 is satisfied. Conditions 
A4 and A6 are satisfied since x(-) is a continuous probability density function 
and the kernel we consider here is a location family of x(') with a fixed scale. 
Condition Al will be proved in the same way as in the proof of Theorem 2. □ 

4. Examples 

In this section, we discuss the KL property for some kernel mixture priors with 
concretely specified kernels. More precisely, we prove that the property holds 
under some conditions on the true density when the kernel is chosen to be 
skew- normal (normal also, as it is a special case), multivariate normal, logistic, 
double exponential, t (Cauchy also as it is a special case), histogram, triangular, 
uniform, scaled uniform, exponential, log-normal, gamma, inverse gamma and 
WcibuU densities. 

4.1- Location-scale kernels 

For a given density x(') supported on the entire real line (or R'^ when X is 
d-dimensional) , we shall consider two types of kernel mixture prior — Prior 1 
where both the location parameter 9 and the scale parameter (j) of 4>~'^x{{^ ~ 
0)/4>) are mixed according to a random probability measure on R'* x (0,oo), 
or Prior 2 where 9 is mixed according to a random probability measure P on 
Mf^ and (j) is given a prior fi on (0, oo). The KL property may be verified by 
checking Condition B1-B9 for the kernel and applying respectively Theorem 2 
or Theorem 3. 

In this subsection, we consider several examples of location-scale kernels. 
Condition Bl and B2 can be easily verified. Conditions B4-B6 are also the 
conditions assumed in all the following theorems for each of the location scale 
density kernels. By choosing prior on P as described in Remark 1, Condition 
B8 can be satisfied. In this subsection, only multivariate normal density has 
a mixing parameter 9 with dimension d > 2. For this kernel Condition B9 is 
obviously satisfied. Hence, in the rest of this subsection, for each kernel function 
and corresponding prior, we only show that conditions B3 and B7 are satis- 
fied. 
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1. Skew- normal density kernel 

Consider tlie slcew-normal Icernel 



Xxix) = 2- 



1 



Xx 



wliere tlie skewness parameter A is given. We liave the following result. 

Theorem 4. Assume that the prior II satisfies B8. Let fo{x) be a continuous 
density on R satisfying conditions B4-, B5, B6 and there exists r] > such that 
/h |a;p(i+")/o(a;)rfa; < oo. Then /o G KL(n*). 



Proof. For Condition B3, we have 



-z + 



<i>'{\z) $'(Az) 



-OO by L'Hospital's rule, since 



$(Az) ' <S>{\z) 

(--'^"^^^^)' - -A.; and ^ 



when z — > oo. Hence Condition B3 is satisfied. 
Condition B7 is satisfied, since 



*(A2 



oo when 




foix) logx{2\x\^x)dx 
and similarly 

'foix) 



foix) ci{x) - 



{2\x 



dx 



< oo 



logX 
fo{x) 



b 

C2{x) 



dx 

{x-af 



262 



dx < oo 



for any a and &, where ci{x) and C2{x) are bounded functions here. 



□ 



Remark 4. With A = 0, Theorem 4 implies Theorem 3.2 of [_'!)], since the 
normal density is a special case of the skew-normal. 



2. Multivariate normal density kernel 

Let xix) ^ (27r)^''/2n,f=ie-^?/2^ where x : 
lowing result. 



,Xd)- We have the fol- 



Theorem 5. Assume that the prior 11 satisfies B8. Let fo^x) be a continuous 
density on R'^ satisfying Conditions B4, B5, B6 and that J \\x\\'^^^^'^'^ fo{x)dx < 
oo for some rj > 0. Then /o G KL(n*). 

Proof. The proof of this theorem is very similar to the proof of Theorem 4, 
with A = and some other minor modifications in all the steps except in veri- 
fying Condition B7. Note that for some bounded functions ci{x) and C2{x), we 
have that 



foix) \ogxi2\\x\\''x)dx 

ci{x)fo{x)dx- I 2fa{x)\\xf^^+''Ux < oo. 
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and similarly 



foix) 



dx = / ,fo{x) 



C2{X) - ■ 



dx < 



262 

for any a and b. □ 

3. Double-exponential density kernel 

Let x{x) = 5^ Wc have the following result. 

Theorem 6. Assume that the prior II satisfies B8. Let fo{x) be a continuous 
density on R satisfying B4, B5, B6 and \x\^^^ fo{x)dx < oo for some rj > 0. 
Then fo € KL(n*). 

Proof. Condition B3 is satisfied, since = -1 when z > 0, and = 1 
when z < 0. Condition B7 follows easily from the fact that | logx(x)| is a linear 
function of I x|. □ 

4. Logistic density kernel 

Let the kernel be x{x) ~ + e^^)^. We have the following result. 

Theorem 7. Assume that the prior 11 satisfies B8. Let fo{x) be a continuous 
density on R satisfying B4, B5, B6 and \x\^^^ fo{x)dx < oo for some 77 > 0. 
Then fo e KL(n*). 

Proof. Condition B3 is satisfied, since ^^r-r ^ — 1 as 2; — > 00 and ^r-r 1 
as z ^ —00. Condition B7 is easily verified since the tails of logx(a;) behave 
like 1x1. □ 

5. tj^-density kernel 

Let the kernel be given by 

X,y{x) ^ 



^r(f) (1+ (2i_^)(.+i)/2' 

where the degrees of freedom 1/ is given. Let log_^ u = max(logu,0). We have 
the following result. 

Theorem 8. Assume that the prior II satisfies B8. Let /o(x) be a continuous 
density on R satisfying B4-, B5, B6 and log^ |a;|/o(a;)(ia; < 00. Then /o € 
KL(n*). 

Proof. Condition B3 is satisfied, since = —cz{l + ^)^^, where c is a 

positive constant. 

Condition B7 can be verified by observing the tail of | \ogXi^{x)\ has growth 
like log|a;| as |a;| — > 00. □ 

Remark 5. Since the Cauchy density is the ^-density with 1/ = 1, Theorem 8 
applies to the Cauchy kernel. 
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4-2. Kernels with bounded support 

The priors with kernels supported on [0, 1] are preferred for estimating densities 
supported on [0, 1]. We study the KL property of such priors using Theorem 1. 
The following lemma will be used in the following proofs repeatedly. 

Lemma 4. For any density fo on [0, 1] and e > 0, there exist m > and 
fi{x) >m>0, such that U*{je,{fi)) > implies that H* (^,_^yj(/o)) > 0. 

Proof. If /o is not bounded away from zero, then define 

max{fo{x),m) 



J max(/o(u), m)du 



By Lemma 5.1 in [12], we have /C(/o; /) < (c+ 1) logc+ [/C(/i; /) + ^JC{h;f)], 
where c = J max(/o(x), TO)(ix. Hence, c — > 1 as to — > 0. For any given e > 0, 
there exists to > such that (c + l)logc < e. Therefore ^* {J^2<i+^/l{fo)) > 

ii*mfi)). □ 

6. Histogram density kernel 

Let the kernel function be 



K{x] 9, to) 



TO, both X and G ((« — l)/TO,i/TO], for some 1 < i < m < oo, 
0, otherwise. 



Consider a kernel mixture prior obtained by mixing both 6 and m. We have the 
following result. An analogous result holds when only 9 is mixed and m is given 
a prior with infinite support. 

Theorem 9. If f(){x) is a continuous density on [0, 1], and the weak support of 
n contains ^([0, 1] x N), then fa G KL(n*). 

Proof. By Lemma 4, we only need to show that Conditions Al and A3 are 
satisfied for the density fo that bounded away from zero. For any e > 0, there 
exist integer to > and {wi, W2, - ■ ■ , Wm}, such that X]"=i Wi = 1 and 



sup 

a:G[0,l] 



foix)-J2^^^\^' ^ 

i=l 



I 1 



< e. (13) 



To see this, define Wi = .. , ,- , . By Riemann integrability of a 

continuous function, for any ei > 0, there exists Mi > 0, such that for to > Afi, 
I fo( — 1| < ei. Since fo is continuous on a compact set, it is 

uniformly continuous. Hence, for any given £2 > 0, there exists M2 > 0, such 
that for TO > Ah, sup |/o(x) - ^"^"^^^^'^^^ K{x; ^,to)| < £2- Let A = 
Er=i^^^^^y^.wehave 



Mx) - ^WiK ( X 



1=1 



< |(A - l)/o(a;) + £2!^ < 2A/ei + 2e2, 
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where M is an upper bound for /q on [0, 1]. Hence, by choosing ei and £2 smah 
enough, there exists A/3 = max(Mi,Af2) such that for m > M3, (13) holds. 
Since we consider /o bounded away from here, Condition Al will be satisfied 
by choosing m-e large enough and appropriate weights {wi, . . . ,Wm^}- 
Let 

- i - (5i i - i + (5i 



P : P 



X {nif} > WiC for i = 1, 



where < (5i < 1/4 and e > 0. Since W is not empty and it is an open 
neighborhood of some distribution that belongs to the support of H, F S W, 



we have with the index i corresponding to the given x, 
/ /o log ^ < e for all P^W. 

7. Triangular density kernel 

Let the kernel function be 

[l2n~2n^x, € (0, i), 
0, otherwise, 



fp 



and hence 
□ 



K{x] TO, n) = < 



m 
n 

TO 

n 



X e 



n, a; G 



TO 



1 TO 



0, 



n n 
m TO + 1 
n ' n 

otherwise, 



1,2, 



2n + 2n2(x-l), x € (0, i), 
0, otherwise. 



m = n. 



Construct a kernel mixture prior by mixing both to and n. We have the following 
result. 

Theorem 10. Let /o(a;) be a continuous density on [0, 1], and the weak support 
o/n contains .^([0, 1] x N). Then fo G KL(n*). 



Proof. Since the mixing parameters are discrete, defining w; =- , , , 

and letting W ~ {P : P{i/n) > WiC"'^, for i = 1,2,..., n}, we can complete 
the proof as in Theorem 9. □ 



8. Bernstein polynomial kernel 

In the literature, Bernstein polynomials have been used to estimate densities 
under both frequentist and Bayesian framework. The motivation of the prior 
comes from the fact that any bounded function on [0, 1] can be approximated 
by a Bernstein polynomial at each point of continuity of the function; see [22]. 

As in [23; 24], consider a prior 11* induced on i^(X) by the map 

3=0 ^•''^ 
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and priors (wq, ■ ■ ■ ,Wk)\k ^ Hk and k ^ fi, where /i is a discrete distribution sup- 
ported on the set of all positive integers, 11^ is a distribution supported on (fc+1)- 
dimensional simplex = {{wq, ■ ■ ■ ,Wk), < wj < 1, j = 0, . . . , k, wj = 
1}. We can then rederive Theorem 2 of [2(i] from Theorem 1. 

Theorem 11. If fQ{x) is a continuous density on [0, 1], fJ.{k) > for infinitely 
many k ~ 1,2, . . ., and Ilk is fully supported on 3^k, then /o G KL(n*). 

Proof. Though the prior is slightly different from Prior 2 in that 11^ is allowed 
to depend on k, we can still use Theorem 1 by changing > to Tlk{^) > 

for any given k. This follows since k is discrete. By Lemma 4, we may assume that 
fo is bounded from below. Since Bernstein polynomials uniformly approximate 
any continuous density (see, for instance. Theorem 1 of [5]), it follows that 
Condition Al is satisfied. Condition A3 holds by the discreteness of k and the 
assumed positivity condition of its prior. The rest of the proof proceeds as before 
by considering all possible weights > wje^'^ . □ 

4-3. Kernels supported on [0, oo) 
9. Lognormal density kernel 



type I or type II mixture prior based on this kernel. 

Transform a; i— > in the kernel function and in /q. If the model using 
K {e^ ; 9 , (p) as kernel function possess KL property at e''/o(e^), then the cor- 
responding model using K{x; 9, 0) as kernel function possess the KL property 
at /o(a;). This is because of 



For the lognormal kernel, we have the following result. 

Theorem 12. Assume that the prior 11 satisfies B8. Let fo{x) be a continuous 
density on satisfying 

1. fo is nowhere zero except at x = and bounded above by M < oo; 
^- \jR+fo{x)log{xfo{x))dx\<oo; 

^- /r+ /o(2^) log j}$)dx < oo for some S > 0, where (j)s{x) = inf |t_^|<j fo{t); 
4- There exists 7] > such that \ J-^_^ fo{^)\ \og x\'^^'^'^^^ dx\ < oo. 
Then fo e KL(n*). 

Proof. Considering the kernel function 0^^x((y-0)/0) = -^i^e^^^^^)^/^^'^^), 
we can apply Theorem 4 with A = or Theorem 5 with d = 1. It follows 
from a change of variable that go{y) e^fo{e^) satisfies B4, B5, B6 and 
/ Ivl"^^^^^^ goiy)dy < oo for some rj > 0. □ 



Let the kernel function be K{x; 9, cf)) 



. Consider a 
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10. Weibull density kernel 

WeibuU is a widely used kernel function. Ghosh and Ghosal [17] discussed 
a model using this density as kernel function and showed posterior consistency 
useful in survival analysis. However, the assumption for the true density /o 
assumed there was quite strong. Here we establish the KL property with this 
kernel under very general assumptions. 

The Weibull kernel is given by K{x;6,(j)) = 9(j)^^x^^^e^^ 1'^. We can trans- 
form this kernel using the map a; = to 

^ ^ y-f ~^ log 

where W{z) = exp[z — e^], the location parameter is log;/) and scale param- 
eter is . We have the following result. 

Theorem 13. Let fo{x) be a continuous density on R+ satisfying 

1. fo is nowhere zero except at x ~ and bounded above by M < oo; 
^- I /r+ foix)log{foix))dx\ < oo; 

^- Ir+ foi^) log ^^'^^ < °o S >0, where (j)s{x) = inf |i_^|<5 /o(t); 

4- there exists rj > such that e^l'°s^l ' is fo-integrable; 
5. the weak support of H contains 



Then, fo G KL(H*). 

Proof. We need to verify Conditions B3-B7 for kernel W{-) and true density 
e^/o(e^). Condition B3 is satisfied, since we have = 1 — e^. To verify 

Condition B7, observe that Condition 4 of this theorem implies 



e^/o(e2')loge2|«l'+V(e2|^l'^'')d2/ 



< oo 



and 

eyfo{ey)\\ogW{e'^)\dy < oo. 



/ 

Jr 



□ 



11. Gamma density kernel 

The gamma density is one of the most widely used kernel function for density 
estimation on [0, oo). Hason [18] discussed a model using the gamma density as 
kernel with the hierarchical structure has as many stages as the most general one 
we discussed in Section 1. Chen [4] and Bouczmarni and Scaillet [3] discussed a 
mixture of gamma model with a different parametrization. 

Let K{x;a,(3) = r(a)/3° x°'~^e~^^^ be the kernel function. Set 

, , , jini[^,^+s) foit), < X < 1, 

I int(:r-5,r] /o(i), X>1. 
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Theorem 14. Assume that the weak support of prior H is ./#(]R+ x IR+). Let 
fo{x) he a continuous and hounded density on [0, oo) satisfying B4, B5 and 

B6* J foix) log dx < oo for some S > 0; 

B7* there exists r]> 0, such that J raax{x~^~'^,x^'^'^)fo{x)dx < ao. 
Then, /o e KL(n*). 

Proof. Wc use Km{x;a) to denote K{x;a,m~^). Let 

fm{x)=tjnj Kjn{x;a)m.^^fo{{a-l)/m)da, (15) 

where = fo{s)ds)~^ . Let P,„ denote F*^^ x 6{m^^), where is the 

probabiHty measure corresponding to t„im~^ fo{{a — l)/m)ll(a € [2, 1 + m^]) 
as a density function for a, and Il(-) is the indicator function. Obviously, Pm is 
compactly supported and fm{x) — fp„A^)- Let F^. be the probability measure 
corresponding to /,„. By Lemma 5 in the Appendix, / fa{x) log -p^dx — ^ as 
m — > oo, which implies that Condition Al is satisfied. 

To complete the proof, we show that Condition A3 is satisfied by verifying 
conditions of Lemma 3. For any given e > 0, let D = [2, 1 + m^] x {m~^}, where 
me is such that / fo{x) log j^^j^dx < e. To verify Condition A7, it is sufficient to 

showthat / fo{x)\log fm.,ix)\dx < oo and / /o(a;)| log inf(a_3)g£, A'(x; a, /?)|da; < 
oo. Based on expression (19), (20) and (25) in the appendix, we have 

log inf K{x;a,P)—\og{mm{K{x;l + ml,m~^),K{x;2,m~^)}), 

(a,P)eD 

for any < a; < oo. Hence 
log inf K{x; a, (3) 

(a,fl)eD 

< xm, + (m2)| logxl + I log {T{m^^ + l)m7^"'+'') | + | log(m72)|. 

By Condition B7*, we have that / | loginf (^a,i3)eD K{x] a, (3)\fo{x)dx < oo. Fur- 
ther, log fm,ix) is also /o-integrable by a similar argument. Condition A8 is 
obviously satisfied. Condition A9 is satisfied by letting E be large enough com- 
pact set containing D. This proves the theorem. □ 

12. Inverse gamma density kernel 

The inverse gamma density function is defined as h{x] a, b) = j=^a;~"~^e~''/^. 
We consider the following reparametrcrization K(x; fc, z) = h(x] k, kz) as the 
kernel function and construct mixture priors. Let (ps defined as in (14). We have 
the following result. 

Theorem 15. Assume that the weak support of prior H contains ^(M+ x R"*"). 
Let fo{x) he a continuous and bounded density on [0, oo) satisfying B4-, B5, B6* 
and B7*. Then fo G KL(n*). 
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Proof. Observe that 



h{x;k,kz)dP{z) = / i--La;-(fe+i)e-'=^/^dP(z) 

Jo r(fc) 

M« + 1) 

g{z; k + 1, x/k)dP[z), 



where g is the gamma density. By Proposition 3.1 in [.3], we have for any x G 
[0, oo), g{z;k+l,x/k)fo{z)dz /o(.t) as fc ^ oo, i.e., K{x; k,z)fo{z)dz - 
fo{x) as fc oo. 

Set fmix) = tm J^^i K{x;k,z)fo{z)dz, where = /o(2;)dz)"\ and 

let Pm be Fm x 5{m), where Fm is the probability mcasme corresponding to 
tmfoix)Mx e [m"\TO]). 

Observe that ^ log(/i(a;; m, mz j) — m{z~^ — x^-^). Hence 



1 



X ^e^ , for a; > TO, 



h{x; m, mz) > 



r(TO)' 

-x~'"^^e^™ for X < m 



r(m) 



The derivative of the logarithm of the expression on the RHS of above relation 
are given by, 

d^ [ Tim) j - - - *°(-) < 0' 

for X > TO, and 

d /^2m2,-m-lg-in^/a; \ 2to 

— log -— = 21ogTO + 2-logx *o(to) <0, 

dm \ r('7i) / X 

for X < m~^ , where ^'o(') is the digamma function, and its details is given in 
the proof of Lemma 5 in the Appendix. Therefore 



1 

X 



^ ^ \ for X > TO, 



h{x\ TO, mz) > 



and hence 



T{x) 

2,-2/3: 



X ^ ^ , for a; > TO, 



fmix) > 



nx) 



1 3 



r(x-i) 



x " "e ^ , for X < m ^. 
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Now, for m ^ < a; < 1 



xvm-i ^{ni)x™'^^ r(m + 1) J„i\/x-^ \ X / X 

> Ga(((5/a; + l)m) - Ga(TO), (16) 

where Ga is the cumulative distribution function (c.d.f.) of gamma distribution 
with parameter (m + 1,1). For large m, the last expression is bounded below 
by {^'(1 + S/x) — <I>(l)}/2 in view of the central limit theorem. Similarly, for 
1 < X < m and large m, the lower bound for the left hand side (LHS) of (16) is 
{$(1) - $(1 - <S/^)}/2. 
Let 

■{$(1 + V^;) -$(l)}/2, 0<a;<l, 



C{x) 



{$(l)-$(l-(5/a;)}/2, a: > 1. 



Now we have that fm{x) > (j)s{x)C{x) and J | logC(a;)|/o(a::)(ia; < oo. As in 
the proof of Lemma 5 in the Appendix, this gives a lower bound of fm{x) for 
< X < m. 

Now we have the lower bound of fm{x) for any large m. Along the same line 
as for gamma kernel in Lemma 5, we can show that / /o(x) log j^^dx ^ as 
m oo, which implies that Condition Al is satisfied. Similarly as in the proof 
of Theorem 14, we can show that Condition A3 is also satisfied. □ 

13. Exponential density kernel 

Consider a mixture prior based on the exponential kernel. Let K{x; 9) = 
9e^^^. Recall that a function ip on M+ is completely monotone if it possesses 
derivatives (/?^"^ of all orders and (— l)"(^(")(a;) > for x > 0. Let Fo{x) = 
1—Fq{x), where Fq is the distribution function corresponding to density function 
fo- We have the following result. 

Theorem 16. If fo is a continuous density on M+, x and |log/o(a;)| are /q- 
integrable, Fq{x) is completely monotone, and the weak support of li is ./#(R+), 
then fo G KL(n*). 

Proof. Since Fo{x) is completely monotone, by Theorem 1 in [S, Chapter 
XIII. 4], it is the Laplace transform of a probability distribution Pq, i.e., Fo{x) = 
e-'^'-'dPoie). Taking derivat ive on both sides, 

Mx) = --^ e-'^dPo{e)^ ee-'^dPoie) = K{x-e)dPo{e). 

ux Jq Jq Jq 

Hence under the conditions in this theorem, the true density is of the form of 
mixture of the kernel. 

Let PaiA) ^ Po{An [a-^,a])/Poi[a-^,a]) for any A C R+ and /p„ denote 
K{x; 0)dPa{e). For any x G (0, oo), K{x; e)dPa{e) ^ j'^ K{x; e)dPa{0) 
as a — > oo. Hence, log jp^^].^ ^ pointwise as a — > oo. 
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Since |log/o(x)| is /o-intcgrable. showing |log/p^(a;)| is not greater than 
an /o-integrable function suffices for an apphcation of DCT to obtain that 
/o°° fo{x) log j^^dx ^ as a ^ oo. 

Note that = p^^^^-^M) -f--' K{x,e)dPoi0) and/o(x) = / K{x,e)dPoi0). 
There exists oo > such that for a > ao, fPa{x) < 2fo{x). 

Observe that for given x, K{x, 9) is increasing on (0, x^^] and decreasing on 
{x~^ , oo) as a function of B. We obtain the lower bound of fp^{x) by using this 
property. First, let 6*1, 02 and 6*3 be such that Po((0,6'i)) = qi > 0, Po{{0i,02)) = 
q2 > 0, Po{{02,03)) ~ > 0, Po((^3,oo)) = q4 > 0. Choose a sufficiently large, 
such that < 01 and a > 9^. 

For x > 02^, K{x,0) is decreasing as a function of on [6*2,00). Hence 
fPai^) > ^36"^^^ 93- For < X < 02^, K{x; 0) is increasing as a function of on 
(O,°02). Hence, fp^x) > 0ie~^'^^q2. 

Therefore, for a large, we have 



2/0 (x) > fp^ix) > 



he-^o^qs, x>l/02, 
9ie-^«ig2, X < 1/02. 



Hence, 



log 



fpAx) 



< |log/o(x)| + |log/p„(x)|. 



and 

I log/p„(x)| < niax{log2 + | log/o(.T)|, | log(03g3)| + W, \ log{0iq2)\ + \x0i\}. 
Since log/o(a;) and x arc both /o-intcgrable, by the DCT, wc have 

fo{x) \og-——dx 
fPa (x) 

as a 00. Thus Condition Al is satisfied. 

To show that Condition A3 is satisfied, we verify that Conditions AT-AQ are 
satisfied. For any e > 0, there exists a > 1 such that fn{x) log j^^^^dx < e. 

From above, we have that J \og{fp^{x))fo{x)dx < 00. Let D = [a~^,a], then 
|log(infe6Di^(a;;e))| < xa'^ + xa + \oga. By DCT, logiinfg^D K{x;0)) is /q- 
intcgrablc. Hence, Condition A7 is satisfied. Condition A8 holds obviously. For 
Condition A9, the uniform cquicontinuity holds for any compact E. Without loss 
of generality, let C = [01,02], E = [{ab)^^,ab], where 6 > 1, and hence E D D. 
Choosing b such that (ab)^^ < c^^ and ab > c^^, then, by the monotonicity 
property of exponential density function, sup{K{x,0) : x € C,0 £ E'^} = 
max(^e-^'=i,a6e-''^'=i) as 6 ^ 00, so A9 is satisfied. Thus, /o G KL(H*). 

□ 

14. Scaled uniform density kernel 

Let the true density /o be supported on X = M^", and consider a mixture prior 
based on the scaled uniform kernel K{x; 0) ~ 0^^11{O < x < 0}. 
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Theorem 17. If fQ{x) is a continuous and decreasing density function on 
such that //o|log/o| < oo and the weak support of II is ^(M+), then /o € 
KL(n*). 

Proof. We will show that Conditions Al and A3 are satisfied. Let xi > 
and X2 > 0, such that fo{xi) = a and fo{x2) = b, where < 6 < 1 and 
b < a < /o(0). For given m, let mi and TO2 be such that ^ < xi < "^l^^ and 

m — — rn 

Let 



- foi-)-fo{ 



i + L 



mi 
m 



fo 

(mi + 1) 



mi 
m 





m 


i 


/° 




m 




i 


(fo 


m 



a - fo 



mi + 1 



i - 1 
m 

i - 1 
m 



-./o(-: 

TO 



/o(-) 

TO 



1 < 2 < mi, 
i = mi, 
i = mi + 1, 
mi + 1 < z < m2, 
i > m2 + 1. 



We define f^{x) = Y.T^*i^i^' T^)- the continuity of fo, converges to 



/o pointwise. Note that is not a p.d.f. Let 



for mi < i < m2 and = ui* for all other I's. Then Wi's are positive and 
m = 1. Let fm{x) = ^iK{x; Observe that 



fnAx)-fm{x) = 



1 V^mi 



E 



m2 + l 



< 



1 - Er 



Eoo 
m2 + l 



y"^ w* 

rni oo m2 



m2 + l 



l-^f2f{^/m)~^] 
m TO y 



mi 

a \ TO 



\ nil 

m 
mi 

■ 



m 

771 1 



mi 



(17) 



as m ^ oo. by the definition of Riemann integral. Thus /„ converges to /o 
pointwise. Let m large such that the expression on the RHS of (17) is less than 



|, we have that 



'niax(log2 + | log/o|, | loga - log2|), < a; < toi + 1, 
log/™(x)| < { max(loga,log(/o(a;2 + 1))), toi + 1 < x < TO2 + 1, 

|log/o|, a; > 7712 + 1. 
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Since |log/o(a;)| is /o-integrablc, by using the DCT, wc have / /o log > 

as m — > oo. Condition A3 is satisfied by a similar argument as in the proof for 
Theorem 9. □ 



Appendix A 



Lemma 5. Let fm{x) be defined as in (15). If the conditions of Theorem H 



are satisfied, then J fo{x) log j° ^/^l dx as m oo. 



Proof. First, we derive the lower bound of fm{x) for x in different intervals. 
Observe that ^ 

— \og{Km{x]a)) = logm + logx- 5'o(a), (18) 
da 

where '^q{z) = ^ log(r(z)), is the digamma function. Also '^i^[z) is continuous 
and monotone increasing for z G (0,oo), '^o{z + 1) = ^'o(^;) + ^, and ^'0(2) — 
log(z — 1) ^ 0; see [2. pp. 549-555] for details. 

For X < TO~\ log(ma;) < 0, and ^'0(0;) > ^'o(2) = 0.42 for a G [2, 1+m'^], and 
hence ^ log{Km{x] a)) < 0. For x > m + and a g [2, 1 + m^], log(ma;) > 
log(TO^) > *o(l+™^) > *o(a), and hence ^ log{K„i{x; a)) > 0. Thus replacing 
a by 1 + in the integrand, we obtain a lower bound for fm{x), x < m~^, as, 

fm{x) > t,n / ^. . fo{a)da = . (19) 

J2 1 [m^ + 1) 1 (m^ + 1) 

Similarly, replacing a by 2 in the integrand, we obtain that for x > m + m~^, 

f,n{x) > xe-^^m^. (20) 

Consider the RHS of equation (19). For x < m~^, we have 

d , /a;"'e-^™m™'+i\ , . . ^ t / 2 . m ™^ + 1 

T- log T^r 2 , 1^ = 2m[log(a:;m) - ^o{m' + 1)] + x < 0, 

am \ 1 [m^ + 1)1 m 

for all m sufficiently large, where ci > is some constant. Consider the RHS of 
equation (20), for x > m + in^^, we have (a;e~^™TO^) = X7ne^^"^{2 — xm) < 
0. 

Hence, replacing m by x^^ on the RHS of (19), we obtain a lower bound of 
fm{x) for X < m~~^ as below, 

^"^^^ - r(m2 + 1) - r(x-2 + 1) ~ exT{x-^ + iy ^^^^ 

and similarly, replacing m by a; on the RHS of (20), wc obtain that for x > 
m + m~ ^ , 

fm{x) > xe-^™m2 > e-^'x^. (22) 

Now, we consider fm{x) for m^^ < x < m -\- m^^. Let S > he fixed and 
V = {a — l)/m. For m large. 
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fmix) > 



Kmix; mv + l)<,„/o(w)dw 



> 




x+5 



x — 5 



Kmix; mv + l)dv, a; < 1 

'X 

K„i{x] mv + l)dv, a: > 1 



> Cix)(bsix), 



where C{x) is given in Lemma 8. 

Now we have the lower bound of function fm{x), 

fm{x) > < min( C{x)(l)s{x), 



exT{x~^ + 1) 
^min(C(x)05(a;),e"^^x^), 
where < R < m. Hence, we have that 
fo{x) 



R'^ <x< R, 
0<x < R-\ 
R< X, 



(23) 



log 



frn{x) 



< ax) 



log 



Ux) 



c{x)Mx)' 



<x< R, 



max < log 



fo{x) 



C{x)4>s {x. 



■,log(ea;r(x^^ + l)/o(x)) }, 0<x< R-\ 



, , h{x) fo{x) 
max <^ log ^, , , , log _ 2 



Cix)<j)six) 



R < X. 



Since fo{x) < M < oo, we also have that log j2- > log^^. Further, as 



Jo. 

U 

tojx) I 



log M£i < 0, we have | log < max{e(.T), | log ^|}. 

By Condition B5, / | log^\h{x)dx = log A«2 - J fo iog{fo)dx < cx). Now, 
consider J ^{x)fa{x)dx, which equals to 



/o(a^)log dx 
ji-i C{x)(l)six) 



fo{x) max <^ log 



fo{x) 



■,log(/o(a;)) - logiexTix-^ + l)/o(x)) } dx 



C{x)<j)s{x 

+ I fo{x) max { log log(/o(x)) - log -M^ 



< 



Cix)q^s{x)- 
fo{x)log^dx+ r fo{x)lo: 



dx 



4>s{x) 



1 



C{x) 



dx 



(24) 



(o,i?-i]nA 



/o(x) \og{exT{x-^ + \)h{x)) 



dx 



(R,oo)r\B 



foix) 



log 



foix) 



dx. 
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where A ^ {x : fo{x) > [exT{x-^ + 1)]"^}, and B = {x : /o(x) > e"^ x^}. The 
above relation (24) holds since C{x) < 1 by Lemma 8 and max(a;i , X2) < X1+X2 
if xi > 0. 

The first term on the RHS of (24) is less than infinity by Condition B6*. By 
Lemma 8, the second terms on the RHS of (24) is also less than infinity. Note 
that, by Stirhng's inequality, (see [8, vol. L pp. 50-53]) 
1 



log 



exT{x ^ + 1) 



< I loga;| + 1 + log(27r) + {x.-' + 1) log{x-'' + 1) + ^^^ix-^^^l^)^ ^ 

for < a; < 1. Hence, the third term on the RHS of (24) is less than infinity 
by Condition B7*. Similarly, so is the fourth term. By Lemma 6, we have that 
fm fo pointwisc. Thus, by the DCT, J fo{x) log -j^^^dx — > as m ^ 00. □ 

Lemma 6. Let fm{x) be defined as in (15), then fm{x) — > fo{x) as m 00 
for each a; > 0. 

To prove this lemma, we need the lemma below, which generalizes Theorem 
2.1. of Devore and Lorentz (1993) from two aspects — the functions and 
/ are considered on a possibly non-compact X, and the intervals Am can vary 
with TO. 

Lemma 7. Let Am = [am,&m] C X, and let Km{x]t) be a sequence of con- 
tinuous functions for x £ X and t £ Am- Define fm{x) = Km(x,t)f(t)dt, 
m = 1,2, . . where f is bounded, uniformly continuous and integrable on X. If 
Km satisfies 

CI. Km{x,t)dt 1 as m — > 00, 

C2. for each S > 0, J^ri:^t\>s teA \^m{x,t)\dt -^0 as m 00, 
C3. \Kmix,t)\dt < M(x) < 00 for each x £ X, m ~ 1,2 . . ., where the 
bound M{x) may depend on x, 

then fmix) f{x) for each x £ X. 

Proof. Let e > be given and let (5 > be so small that \f{t) — f{x)\ < e for 
Ix — i| < S. Because of Condition CI, 



fm{x)-f{x)^ / [fit)- f{x)]Km{x,t)dt + 0{l), 
JA,„ 

where the last term goes to for to ^ cx3, for each x £ X. We have 
/ [f{t)- f{,x)]Km{x,t)dt <e j \Kmix,t)\dt<eM{x). 

J\x-t\<S,t£A^ J\x-t\<S,t£Arr^ 

It follows from Condition C2 that for each S > 0, and any bounded continuous 
function /* on X, /|^_(|>5 f*{t)Km{x,t)dt ^ as to ^ 00. Hence, 

/ [f{t) ~ f{x)]Km{x,t)dt -^0 asm ^ 00. 

j\x-t\>s,teAm 

By Condition C3 it now follows that \ fm{x) — f{x)\ < eM{x) + o(l), and hence 
the result. □ 
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Proof of Lemma 6. Let w = (a — l)m ^ and u — m ^. Let 

^v/u^ — x/u 

i [vju + 

and K„i(x;v) = A'(a;; w, to^-'^), where v G A,„, yl„i = [m^^,m]. Now fmix) = 
J^-i Km{x, v)fo(v)dv, we show that such Km(x, v) satisfies condition C1-C3 in 
Lemma 7. 

Given a; > 0, consider expression (18), for m sufficient large, such that m^^ < 



X < m + m 



we have 



< w < X — m 



m > V > X — m + p, 



(25) 



where p is some smafi positive number. Also, note that -^Km{x] u) < for all 
a; > and ra~^ < v < m. Thus, the first order derivative changes from positive 
to negative as v changes from to m for given x and sufficient large m, and 
hence, there exists mo such that K{x;v,m~^) is increasing as a function of v 
when V < mo and decreasing when v > mo. For sufficient large m, 



E 

t=0 



{xmY 



(xm) 



[mo] + l 



([mo] + 1) 



< e' 



< e" 



{xmY 



-divrn) 



E 

t=0 



r(wm + 1) 
{xmY (xm) 



t\ 



(K] ' 1)! 



(26) 



where [z] stands for the largest integer less than or equal to z. Using the expres- 
sion for the remainder of Taylor's series, we have the LHS of (26) at least 



1 



(a:m)''"ol + ^ 
(l^ol + l)! 



(27) 



where x* G (0, x). It is obvious that the expression in (27) tends to 1 as m oo. 
Similarly, we have that the RHS of (26) tends to 1 as m — > cx). Hence, 



K(x] V, u)dv — e 



[xmY 



T{vm + 1) 



d{vm) 



1 as m 



oo. 



that is, Condition CI is satisfied. 

From above, wc also know that Condition C3 is satisfied, since K„-^{x\ w) > 
for all V S Am and a; S X. 

To verify Condition C2, for any (5 > and x G X, we want 



' \x — v\>S ,vGA„ 



Km{x, v) 



dv = 



^(xm)™ 



\^_^\yS,v(^A^ r(z;m + l) 



-dv 0, 
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as m — > oo. We show that for any 6 > 0, 

e-™(a;m)™ 

m sup — — r > as m CX), 

\x-v\>s,veA^ i[vm + l) 

which is equivalent to showing that 

logm + log > — oo for all v G A,n, \x — v\ > S. 

r[vm + 1) 

For any v such that v G A^, \x ~ v\ > S, we have by Stirling's inequality for 
factorials, 

log m + log 



T{vm + 1) 

e'^"^{xm)^ 



<logTO + log 

[vm\\ 

< log TO + vm log(a;TO) — xm — vm log vm + vm 
= log TO + {1 + log{x/v) — x/v}vm —I- —oo, 

as TO ^ oo, since for any given x and (5, there exists q < such that 1 + 
\og{x/v) — x/v < q for all the v G A,n, \x — v\ > S. 

Thus Conditions C1-C3 in Lemma 7 are all satisfied and we have that 
fm{x) fo{x) as m OO for each x > 0. □ 

Lemma 8. Let Km{x;a) be defined as in Section 12. If Condition B7* is sat- 
isfied, then there exists a function < C{x) < 1 such that 

Km{x; mv + l)dv, m^^ < x < 1, 

c^(-)<ri;r (28) 

Krn{x; mv + \)dv, 1 < cc < to + , 
and J log ■^^fQ{x)dx < oo. 

Proof. For m^^ < x < 1, applying Stirling's inequality and noting that v < 
x + S<l + Sm the following integral, it follows that 

rx+5 

K„i{x; mv + l)dv 



m ^Vx 



r(TOi; + 1) 



x-\-8 ^mv+l ^mv ^—mx 

> I ^= ; dv 

ivx V2Tr{mv + l)""'+i/2 exp{-(77iu + 1) + {12x)-^} 

^ rx+S ^mv pm(v~x) 



exp(l - {I2x)-^) / -n„,.+i/2 ^^ 
27r J,„-iv:e(u + to 1)"'"+^/-: 



x-\-5 mv pm{v—x) 



> , exp(l - (12x)-^) / .^^——dv. (29) 
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Note that 



x-\-5 ^mv ^m{v—x) 



dv 





i-x+S 










exp 


mv j 












i-x+S 










exp 













X 



V + m 

X 

V + m" 



X 



dv 



V + 

X 

V + 



> 



x+S 

i-i Vx 
x+S 

ra^^ Vx 



exp 



exp 



-•mv(x — V — m~^)^ — 2x'^ 



- 1 



x/m 



v{v + m 1) 



2x(u + m 1) 



The above inequahty holds, because of that, for < u < 1, 



1 (l-uf 



> 



{l + {l-u) + {l-uf + ■■■} 



(l-uf 



2 ^ ' ' ' ' 2u 
Since l + 5>x + 5>v>xm the foUowing integral, we have that 



x+S 



> 



exp 



—mv{x — V — m — 2x'^ 



2x{v + m 1) 



dv 



x+S+m~'^ 
{rn^^\/x)+m^^ 

2tt X 



exp 



-m{l + S){x ~ i)'^ - 2x'^ 



> 



m ^/^ + 6 
2tt X 



2x2 

S + m^^ 



dv 



^x/y/m{l + S)^ 
S + m^^ 



- $ 



x/^m(l + 5) 



TO Vm 1 \ a:/v/m(l + (5) / \x/^m{l + S) 



,(30) 



where {; = w + m ^ and $(•) is the c.d.f. of the standard normal distribution. 
For m large, such that S > m~^/^, 



5 + m 



x/^/MT+5) 



- $ 



$ VT+S 



xl^m{\ + 8) 



-1/2 



> <^{2VTT5 \f5 / x) - ^\/TT5 5 / x) 

> <i>{2VTT5 s/x) -^{VTTs s/x). 



(31) 



Y. Wu and S. Ghosal/KuUback Leibler property of kernel mixture priors 



328 



The last inequality holds since we chose S < 1. Now for u > 0, 

where (^{x) = (27r)~^/^e~^^/^ is the standard normal p.d.f.. By the fact that 



— ^(j){x) < 1 - <^{x) < 



1 + a; 



X 



(32) 



we have that 



1 + 1 1 

$(2m) - > — — 0(2u) > —(j){2u). 

u 2u 2u 



Hence, the RHS of (31) is greater than 

X 

cxp 



2(5^27r(l + (5) 



2(1 + 6)6^ 



(33) 



Now, combining the expressions (29), (30) and (33), it follows that 



2S{1 + 6)V2TS "^^K I2x a;2 



< a; < 1, (34) 



satisfies (28) for m ^ < x < 1. 

Now let m + > x > 1. Applying Stirling's inequality, we have that 



> 



Km{x; mv + l)dv 



T{mv + 1) 



-dv 



(x^s) V2Tr{mv + l)™-"+i/2 exp[-(mu + 1) + (12a;)-i 

dv 



-dv 



X e 



> 



2tT J^^_g^ + 



y/2Tr{x + S) 

\ogu — {u — 1) 



exp 



-5 



mv < log 



X 



V + m 



(1--) 

V 



dv, (35) 



since v + m ^ < x + S, when rn> 5 ^ . Note that 

1 (u-1) [u-lf 









If] 







{:u-i)-{u-if + ■■ 

[{l-u) + {l^uf + {l~uf + 
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for < u < 1. Further, logu — (w — 1) > —{u — 1)^/2, for 1 < u < 2, since 

> 0. Note that < < -rJ-r, where 6 < k 

— v-\-7n ^ — 1 — ' 2 



3 4 5 

without loss of generahty. Now it foUows that 

X 



log 



V + m 
log 



1-^ 

V 



> 



V + m ^ ) \v ^ m 

1 V + m^^ — X 

2 X 



V + m 

2 



- - 1 



{v + m ^)mv 



Letting v denote v + m ^, the RHS of (36) is equal to 



(a; — 2v){x — v) x 



2xv 



nivv 



> 



{x - if 



mvv 



since ^ ^ > — 1 for v < x + m (i.e. v < x) and a; > 1. Now, 

{•xfwa r f 

mv < log 



exp 



x—S 



V 



(2;Am)+m ^ 

exp 



w + m 1 



exp 



2(a;-(5)2 5 
mx{x — u)^ 



a; — (5+m 

2^ 



dv 



{x-5){<^ 



> ei/^/^(a; - <5) 
V mx 



2(a;-(5)2 
/ma; ((5 — m~^) 



2{x - 6) 



x6' 



{x~6y 



(36) 



(37) 



for m/2 > S^^, since ^{z) — 1/2 > (j>{z)z for any z > and since x > v > 1 — S 
for all x > 1. Combining expressions (35) and (37) and simplifying, we conclude 
that 

(Sexp(3/2- 12x-i) 



C{x) 



■ exp 



{x - sy 



x> 1, 



(38) 



2.y2TT{x + 6) 

satisfies (28) for 1 < a; < m + m^^ . 

Now for C{x) defined by (34) and (38) satisfies (28). Further, by straightfor- 
ward calculations, / log ^T^/o(a;)(ia; < oo under condition B7*. □ 
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